Bearing fault detection by using graph autoencoder and ensemble learning

The research and application of bearing fault diagnosis techniques are crucial for enhancing equipment reliability, extending bearing lifespan, and reducing maintenance expenses. Nevertheless, most existing methods encounter challenges in discriminating between signals from machines operating under normal and faulty conditions, leading to unstable detection results. To tackle this issue, the present study proposes a novel approach for bearing fault detection based on graph neural networks and ensemble learning. Our key contribution is a novel stochasticity-based compositional method that transforms Euclidean-structured data into a graph format for processing by graph neural networks, with feature fusion and a newly proposed ensemble learning strategy for outlier detection specifically designed for bearing fault diagnosis. This approach marks a significant advancement in accurately identifying bearing faults, highlighting our study's pivotal role in enhancing diagnostic methodologies.

adjusts the model structure to suit different data types and features.However, it may have limitations in dealing with large and complex data.Webb et al. 43 introduced a multi-strategy ensemble learning approach that combines different ensemble learning techniques to achieve better performance and generalization capability.This method is able to adapt to different data types and task requirements, and is robust and scalable.Additionally, Xu et al. 39 proposed a forest fire detection system that utilizes various machine learning algorithms, such as random forests, support vector machines, and neural networks, to construct detection models and provide high accuracy.However, implementing this system may require certain technical expertise in machine learning and software development.
Outlier detection, also known as anomaly detection, is a technique in machine learning and data mining that aims to identify data objects exhibiting different behavior than the predicted data.These objects, referred to as outliers, fundamentally differ from the normal behavior pattern of the data.Unlike noise, which represents random errors and variances in observed variation, outliers in production machines deviate significantly from the rest of the data.Bearing failure itself is considered an anomaly, making outlier detection a relevant research direction in machine learning.Therefore, we can utilize outlier detection techniques to address the problem of bearing failure.Additionally, ensemble learning, which combines multiple algorithms, can enhance the performance, robustness, and stability of models.In this study, we applied well-established ensemble learning techniques from the machine learning field to the domain of bearing fault diagnosis.Specifically, we selected five mature outlier point detection algorithms as the base detectors.
Graph AutoEncoder 44 (GAE) is a widely used graph neural network-based method for outlier detection.It sorts data objects in descending order and calculates outlier factors to determine outliers.AutoEncoder 45 (AE) is a type of multilayer feedforward neural network where the number of nodes in the input and output layers are equal, and the hidden layer has a relatively small number of nodes.AutoEncoder is employed for outlier detection by learning the feature representation of normal data and identifying abnormal data that significantly deviates from normal data.Local Outliers Factor 46 (LOF) is an unsupervised anomaly detection algorithm that measures the local density deviation of a given data point in relation to its neighborhood.The degree of anomaly of each point is determined by comparing its density with that of its neighbors.Connectivity-Based Outlier Factor 47 (COF) is another algorithm used for outlier detection, which assesses the degree of outliers based on the connectivity between data points.The COF value of each data point is calculated by measuring the connectivity between the data point and its nearest neighbor, as well as the average connectivity among all points in its neighborhood.K-Nearest Neighbors 48 (KNN) is a classical outlier detection algorithm that assigns an outlier score to each data point based on its K-nearest neighbor data points.The core idea behind KNN is that outliers have denser neighborhoods, while normal data points have sparser neighborhoods.In KNN, a data point's K nearest neighbor data points should have relatively small distances, whereas the distance between that data point and its K + 1 nearest neighbor data points should be relatively large.

Methodology
In this study, we employed ensemble learning and graph neural network techniques, which are commonly used in machine learning, to address the issue of bearing fault diagnosis.Our model consists of three modules: a graph generation module, feature fusion module, and bearing fault detection module.This section provides a detailed description of these three modules.The core idea of our method is to convert the original Euclidean dataset into an adjacency matrix A using the randomness-based combination module.Then, the original dataset X and the adjacency matrix A are inputted into the feature aggregation module to generate aggregated adjacency features, which are used to obtain the matrix Z.The matrix Z is then fed into an ensemble learning-based anomaly detection module (COF, LOF, GAE, AE, and KNN) for bearing fault detection, resulting in the anomaly matrix.The top-S base detectors with better detection ability are selected in descending order, and the final outliers of each node are obtained by averaging the outliers obtained from the top-S base detectors.The structure of BFDGE is shown in Fig. 1.

Random connections-based graph construction method
Graph generation is the process of converting each object in a Euclidean dataset into a node and organizing them into graph data.This process involves the following steps: a) importing the dataset X and initializing its www.nature.com/scientificreports/adjacency matrix A. b) assigning a value of 1 to all diagonal elements of matrix A to represent self-connections of each node.c) selecting one object as the root node, randomly choosing k objects from the remaining set, and connecting the root node to these nodes by creating directed edges.

Euclidean distance calculation
The Euclidean distance, also known as the Euclidean metric, represents the distance between two points in Euclidean space.In the higher dimensional Euclidean space, the Euclidean distance is calculated by summing the squared differences of each individual dimension n.
Assuming X i , X j ∈ X and X i ≠ X j , a root node i is selected and a parameter K is set.K nodes are randomly selected from X.If the selected K nodes include the root node i, it is re-selected.Then, the selected K nodes are stored in a random set of neighbors Nk(Xi).

Constructing the adjacency matrix
The Euclidean distance from root node i to any point in its set of random neighbors N k (X i ) is normalized to the weight of the directed edge from root node i to that point.When X j ∉ N k (X i ),the weight between X i and X j is 0.when X j ∈ N k (X i ), the weights between X i and X j are as shown in (2).We represent the resulting graph through the adjacency matrix A as follows: In order to preserve the characteristics of the root node, the diagonal of the adjacency matrix is set to 1. Please note that W(X i ,X j ) does not necessarily equal W(X j ,X i ).The structure of graph generation model is shown in Fig. 2. (1)

Aggregating neighbor node characteristics via GNN
In this study, we propose a graph autoencoder (GAE)-based approach for fusing node features in a Euclidean dataset.Our approach generates a new matrix Z by aggregating neighboring node features.The objective is to address the limitations of existing outlier detection algorithms, specifically in identifying outliers within normal target regions or those mixed around dense clusters.By reconstructing the original dataset, we can accurately isolate these outliers, thereby improving the accuracy and robustness of outlier detection and facilitating downstream tasks.The primary advantage of our approach lies in its adaptive ability to capture the complex structure of the dataset and fuse node features using a graph autoencoder (GAE), effectively extracting the latent features.

Eigenvalue transfer:
Network structure: Loss function: During the training process of the Graph Autoencoder (GAE), we utilize the gradient descent algorithm to update the GAE weights W(0), W(1), and bias vectors b(0), b(1).The structure of feature fusion model is shown in Fig. 3.
In this study, we employ a variety of base detectors, including traditional methods, deep learning-based methods, and the latest graph neural network-based methods.Our objective is to provide a more comprehensive and accurate solution for outlier detection.
The selection of base detectors is a crucial and challenging aspect in ensemble learning, as their performance directly impacts the performance of the integrated model.Ideally, the base detectors should demonstrate high individual performance and complement each other.They should show different performance in various subspaces of data features.However, accurately predicting the applicability range of a base detector in practical scenarios is difficult due to the unknown, variable, and high-dimensional nature of data feature distributions.Therefore, it is necessary to consider their diversity and complementarity when selecting base detectors to improve the generalization and stability of the integrated model.

Bearing fault diagnosis through ensemble learning
To enhance the convergence speed of the algorithm, it is crucial to normalize the output of the base detectors.Normalization is a vital pre-processing step as it addresses the issue of varying magnitudes among different base detectors, which makes direct comparison and combination challenging.Equation (7) illustrates the normalization equation.This procedure entails subtracting the data by its mean μ and dividing it by the variance σ, thereby converting the processed data into a standard normal distribution. (4) The Z-matrix, obtained by aggregating the features of neighboring nodes through GAE, serves as input for each base detector mentioned above.This process facilitates the construction of an integrated learning model with diversity.The output of each base detector in the integrated learning model is then normalized to generate an outlier matrix.The structure of the ensemble learning model is depicted in Fig. 4.
where z i denotes the i-th object in the matrix Z generated from the original dataset X after GAE feature fusion.www.nature.com/scientificreports/Marking outlier levels Currently, the predominant research trend in outlier detection focuses on unsupervised learning.However, the lack of labeled data presents a challenge in accurately assessing the disparity between the predicted outcome of the detector and the extent of unlabeled outliers.To address this issue, we propose a hybrid data-based approach that combines unlabeled normal and outlier data for training.In this method, points with higher degrees of outliers are more likely to be identified as outliers after the model learns the features and generates outputs.Algorithm 2 utilizes basic detectors to detect each object in the Z matrix and produces diverse outliers as output.
We then select the maximum value from these outliers as a measure of the degree of outlier present in the data.Specifically, the labeled outliers are calculated as follows: The matrix of label outliers for all data in Z m×n is calculated from Eq. ( 8) and is shown in the Fig. 5.

Local area construction
The BFDGE algorithm requires the construction of local regions due to the correlation between data objects in a dataset.It uses the detection capability of a base detector on neighboring objects to estimate its detection capability on a specific object.Hence, the algorithm calculates the detection capability of the base detector over a local region.To achieve this, the BFDGE algorithm divides the labelled outliers into clusters and identifies the cluster to which the target object belongs.We use the KNN algorithm to calculate the Euclidean distance between node z i and its surrounding neighboring nodes, and then determine the k nearest neighbor nodes of z i according to the magnitude of the Euclidean distance.These nearest neighbor nodes form the set of neighbors of zi, as shown in Eq. ( 9): We use the KNN algorithm to find the neighboring k data and deposit them into the neighborhood cluster Ω.It is important to note that the choice of k values affects the creation of the neighborhood clusters.First, as the k-value increases, the number of nearest neighbors to be computed increases, thus increasing the computational complexity of the algorithm.Secondly, the size of k-value directly affects the accuracy of prediction.When the k value is small, the algorithm will be more sensitive and may over-fit the data, while when the k value is large, the algorithm will be smoother and may ignore the detailed features of the data.

Combination of base detectors
After determining the neighborhood cluster Ω of object z i in Z, we calculate the detection capability of all base detectors on this local area with the aim of selecting the combination of base detectors with strong detection capability for z i .
We can obtain the outliers corresponding to each data p i in the neighborhood cluster Ω from the already obtained outlier_matrix.The outliers corresponding to each data p i are stored in the matrix.As shown in Eq. ( 10): The label outliers corresponding to each object p i in the neighborhood cluster Ω can be obtained from the label outlier matrix Label m×1 obtained from Eq. ( 9), and the label outliers corresponding to each p i are stored in the matrix.Q k×1 As shown in Eq. ( 11): www.nature.com/scientificreports/After that, we use the cosine similarity to calculate the difference between O k×5 and Q k×1 to derive the detection capability of the base detector ϑ r,i .The better the detection capability of the base detector on p i , the higher the cosine similarity between its output value and the p i label outliers.The calculation is shown in Eq. ( 12): With the above formula, we can get these ϑ r,i .for each object p i in Ω; then sort these ϑ r,i in descending order and obtains base detectors with strong detection capability for p i by selecting the top-S neighborhood clusters.
Similarly, for all objects in Ω, we select the base detector with strong detection power on the local region Ω to detect the neighborhood cluster z i and output (num_local × s) outliers.Finally, we calculate the average of these (num_local × s) outliers as the final outliers of the neighborhood cluster z i .
In this process, we ranked the ϑ r,i . in descending order and the top n of num_detector's had high similarity scores.
In the Eq. ( 13), the θ represents the outlier ratio.For the set S, we sort the elements according to their size, where a higher value of S means that the element is more likely to be an outlier.Then, we select the top n objects in S as the final outliers.

Experiments
In this section, we provide a detailed description of the experimental design and results.Our objective was to validate the effectiveness of the method in detecting bearing faults.To achieve this, we conducted a comparative experiment, comparing our method with several state-of-the-art algorithms.The source code of the model was implemented using MATLAB R2021A.The experimental hardware setup consisted of a Ryzen 7 5800H 3.20 GHz CPU and 16 GB RAM, while the operating system environment was Microsoft Windows 11 Professional.

Introduction of the dataset
The test setup for Dataset 1 included a 2 hp motor, torque transducer/encoder, dynamometer, and control electronics.The motor shaft was supported by the test bearings.To induce failure, the motor bearings were manufactured using electric discharge machining (EDM).Fractures of 0.1778 mm, 0.3556 mm, and 0.5334 mm in diameter were intentionally introduced in the inner race, rolling element (ball), and outer race, respectively.The faulty bearing was then reinstalled into the test motor, and vibration data was recorded at 0 motor load (motor speed of 1797 RPM).The bearing used for this test was SKF 6250, positioned at the drive end.Digital data was collected at a rate of 12,000 samples per second, and for the drive end bearing failure, data was collected at a rate of 48,000 samples per second.Speed and horsepower data were collected using a torque sensor/encoder and recorded manually.Table 1 provides a summary of the CWRU datasets.
Dataset 2 was obtained from the bearing dataset at Xi'an Jiaotong University.The experiments used LDK UER204 bearings, and the degradation vibration signals were measured under various operating conditions.The sampling frequency during data acquisition was set to 25.6 kHz, with a sampling interval of 1 min and a duration of 1.28 s for each sampling.To assess the algorithm's robustness, a set of bearing degradation data was selected for each of the three different operating conditions.Table 2 provides the distribution of the XJTU datasets.
We calculated 23 indicators in the time and frequency domains for the samples in the datasets.These indicators are more convenient for downstream tasks and help improve the quality and accuracy of the data.www.nature.com/scientificreports/Additionally, they reduce modeling errors and biases, and enhance the accuracy and interpretability of the model when compared to the original dataset.
The sequence x(n) represents a set of discrete data points, while its arithmetic mean is represented by.The size of the sequence, or the number of data points, is denoted as N.The sequence x i (n), where i ranges from 0 to 2 j -1, denotes the decomposition coefficient sequence of the ith frequency band using WPD, a decomposition method that operates at level j,Wavelet Packet Decomposition (WPD) is an extension of the wavelet transform that offers a more comprehensive signal analysis.It achieves this by decomposing the signal into more detailed frequency bands compared to traditional wavelet analysis.WPD is highly regarded in vibration signal analysis due to its effectiveness in extracting characteristic fault frequencies from noisy signals.This leads to an improved accuracy in fault diagnosis for rotating machinery.In the context of bearing fault diagnosis, WPD enables the extraction of subtle features from bearing vibration signals.These features indicate the early stages of faults, which may not be detectable using other methods.The ability of WPD to perform time-frequency analysis makes it particularly suitable for diagnosing mechanical faults in bearings under varying load and speed conditions.This is because the frequency content of the signal changes over time in such cases.Meanwhile, IMFi(n) refers to the ith data sequence resulting from EEMD, a separate decomposition method that operates at level NI.EEMD, short for Ensemble Empirical Mode Decomposition, is an advanced signal processing technique that improves upon the Empirical Mode Decomposition (EMD) method.EEMD addresses the issue of mode mixing observed in EMD by introducing white noise to the data in multiple iterations.This iterative process enhances the robustness and reliability of the decomposition, enabling more accurate analysis of complex, non-linear, and non-stationary signals.Due to its adaptability and efficiency in handling real-world complex data, EEMD finds extensive applications in various fields including signal processing, time-series analysis, and even environmental and medical data analysis.
Based on Table 3, the index is calculated for each sample.Four steps are required.
1. Nine-time domain indexes are calculated as follows: 2. E WPD is obtained by calculating WPD energy (parameters j = 3 and wavelet Db20).Outer race fault 60 Table 3. Indexes and the calculation formulas.

Standard deviation
8. Shape factor We calculated 23 indicators in the time and frequency domains for the samples in the datasets.These indicators are more convenient for downstream tasks and help improve the quality and accuracy of the data.Additionally, they help reduce modeling errors and biases, and enhance the accuracy and interpretability of the model compared to the original dataset.
The distribution of the selected dataset is illustrated in the accompanying Fig. 6,we performed PCA dimensionality reduction on the original dataset, where normal objects are represented by blue hollow circles and abnormal objects are represented by red solid circles.It is evident that a significant portion of the outliers are intermingled with the normal data, making their differentiation challenging.

Comparison methods
In this paper, we investigate the problem of bearing fault detection and approach it as an anomaly detection problem in the field of artificial intelligence.To validate the effectiveness of our proposed algorithm, we conducted comparison experiments using state-of-the-art outlier detection algorithms.To ensure robust conclusions, we selected and compared five different types of state-of-the-art outlier detection algorithms.These algorithms are commonly used in the field of outlier detection and have been extensively studied in the literature, demonstrating their effectiveness.It's noteworthy to mention that the GAN-based approach proposed by Du et al. 49 exhibits greater novelty.By comparing our algorithm with these established methods, we aim to evaluate its performance, strengths, and weaknesses, and further enhance and optimize it.The experimental results demonstrate that our algorithm performs exceptionally well across all metrics, confirming its effectiveness and feasibility in detecting bearing faults.Table 4 presents a list of the comparison algorithms and their respective types.
Since the experiments involve multiple algorithms that require different hyperparameters to be set, Table 5 is used to describe in detail the parameter settings used for each algorithm in the experiments.
AUC (Area Under the Curve) is a widely used metric for evaluating the performance of binary classification models.It measures the average ability of the classifier to distinguish between positive and negative cases by calculating the area under the Receiver Operating Characteristic (ROC) curve.AUC is considered one of the most important metrics for assessing the prediction accuracy of a model.
False Alarm Rate (FAR), also known as the false positive rate, measures the probability of the model misclassifying negative cases as positive cases.It is an important metric for evaluating the extent to which the classifier incorrectly classifies positive cases in samples of negative cases.( 16) Accuracy (ACC) is the ratio of samples correctly classified by the classifier to the total number of samples.It is a crucial metric for evaluating the overall performance of the classifier.
A Confusion Matrix is a table that presents the prediction results of a binary classification model.It consists of four values: True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN).TP represents the number of samples correctly predicted as positive cases, FP represents the number of samples incorrectly predicted as negative cases, TN represents the number of samples correctly predicted as negative cases, and FN represents the number of samples incorrectly predicted as positive cases.By utilizing the confusion matrix, we can calculate important metrics such as DR, FAR, ACC, and AUC.The formula for calculating AUC is shown in Eq. ( 19).
x i and y i are the horizontal and vertical coordinates of the i-th sample, respectively, and i is a positive integer; n + and n − are the number of positive and negative cases, respectively; and n is the total number of samples.ACC and DR and FAR are shown in (19), (20), and (21):

Experimental results
In this section, we present a visual comparison of the AUC performance of our algorithm with six other algorithms for bearing fault diagnosis.The comparison is shown on a bar chart.Additionally, we provide a table that further highlights the superior performance of our algorithm on the remaining four metrics.
Table 6 display the results of the BFDGE algorithm applied to six real bearing fault diagnosis datasets, in comparison to six other algorithms, based on four performance metrics.The table illustrates that BFDGE achieves the highest AUC and ACC values across all six datasets.Moreover, BFDGE demonstrates a more substantial enhancement in DR and FAR on the six datasets, respectively, compared to the second-place algorithm.Additionally, BFDGE attains the lowest FAR values on all three datasets.BFDGE excels in detecting faulty samples that are mixed with normal samples, which poses a challenge for traditional distance, density, and

Robustness experiments
BFDGE involves various parameters that affect its performance.These parameters include the number of base detectors, the number of node neighbors in RandomLink, the number of nodes in the local region of each node, the number of hidden layer neurons in the graph neural network, and the learning rate.For our in-depth study, we focused on two parameters that have a significant impact on the detection results: the number of node neighbors (k) and the number of hidden layers in RandomLink.We conducted 20 sets of experiments on three datasets to investigate the effects of these parameters on the performance of BFDGE.The experimental results are as follows: As depicted in Fig. 7, the Area Under the Curve (AUC) of the BFDGE exhibits a gradual increase with the increment in the number of nodes (k) connected to each node in the RandomLink, until it reaches a stable state.This observation suggests that the limited aggregation of random node features per node, due to a restricted number of randomly aggregated nodes, hinders the BFDGE's ability to effectively differentiate between normal and faulty objects.However, with the gradual increase in k, the AUC values of BFDGE also progressively increase on the three datasets, ultimately stabilizing at the highest attained AUC.
Based on the observation of Fig. 8, it is evident that increasing the number of layers in the graph neural network to 3 results in the highest AUC value.As the network's depth increases, the feature values between objects become increasingly similar, leading to similar reconstruction errors in the output layer.This similarity makes it challenging to differentiate between normal and abnormal objects, resulting in over-smoothing.

Conclusion
This paper presents a novel fault detection method for bearing faults using a graph neural network based on ensemble learning.Early faults in bearings often have small amplitude and low intensity characteristic signals, making them inconspicuous, random, and easily masked by system interference and noise.To address this issue, we propose a combinatorial method that converts the original dataset into a graph dataset and utilizes a feature aggregation module to aggregate neighboring node features.Subsequently, unsupervised bearing fault detection is performed using integrated learning.The method involves converting vibration signals into graphs to establish correlations between initially independent signals.The dataset, along with the corresponding graphs, is then inputted into the feature aggregation module for training, enabling fault detection through a new integrated learning strategy.Through detailed comparisons with existing algorithms, we demonstrate that the proposed method successfully detects faulty objects within normal object regions or around dense clusters.In future work, we intend to explore new compositional methods, graph neural networks, and loss functions to achieve even more satisfactory and stable results.

Figure 1 .
Figure 1.The entire structure for bearing fault detection.

DAlgorithm 2 .
denotes the set of base detectors, D c denotes the cth base detector in the set of base detectors.D c (z i ) denotes the outlier of z i at the cth base detector.iter denotes the column position of each base detector in D, init() means to initialize the base detector, train() denotes the training base detector by Z m×n .Outlier value matrix generation.

Figure 6 .
Figure 6.The distribution of the selected dataset.

Figure 7 .
Figure 7. Influence of the number of k-nearest neighbor on BFDGE.

Figure 8 .
Figure 8. Influence of the number of layers on BFDGE.

Table 6 .
Experimental results on real-world datasets.Significant values are in bold.